19 research outputs found

    An investigation of deep learning for image processing applications

    Get PDF
    Significant strides have been made in computer vision over the past few years due to the recent development in deep learning, especially deep convolutional neural networks (CNNs). Based on the advances in GPU computing, innovative model architectures and large-scale dataset, CNNs have become the workhorse behind the state of the art performance for most computer vision tasks. For instance, the most advanced deep CNNs are able to achieve and even surpass human-level performance in image classification tasks. Deep CNNs have demonstrated the ability to learn very powerful image features or representations in a supervised manner. However, in spite of the impressive performance, it is still very difficult to interpret and understand the learned deep features when compared to traditional human-crafted ones. It is not very clear what has been learned in the deep features and how to apply them to other tasks like traditional image processing problems. In this thesis, we focus on exploring deep features extracted from pretrained deep convolutional neural networks, based on which we develop new techniques to tackle different traditional image processing problems. First we consider the task to quickly filter out irrelevant information in an image. In particular, we develop a method for exploiting object specific channel (OSC) from pretrained deep CNNs in which neurons are activated by the presence of specific objects in the input image. Building on the basic OSC features and use face detection as a specific example, we introduce a multi-scale approach to constructing robust face heatmaps for rapidly filtering out non-face regions thus significantly improving search efficiency for potential face candidates. Finally we develop a simple and compact face detectors in unconstrained settings with state of the art performance. Second we turn to the task to produce visually pleasing images. We investigate two generative models, variational autoencoder (VAE) and generative adversarial network (GAN), and propose to construct objective functions to train generative models by incorporating pretrained deep CNNs. As a result, high quality face images can be generated with realistic facial parts like clear nose, mouth as well as the tiny texture of hair. Moreover, the learned latent vectors demonstrate the capability of capturing conceptual and semantic information of facial images, which can be used to achieve state of the art performance in facial attribute prediction. Third we consider image information augmentation and reduction tasks. We propose a deep feature consistent principle to measure the similarity between two images in feature space. Based on this principle, we investigate several traditional image processing problems for both image information augmentation (companding and inverse halftoning) and reduction (downscaling, decolorization and HDR tone mapping). The experiments demonstrate the effectiveness of deep learning based solutions to solve these traditional low-level image processing problems. These approaches enjoy many advantages of neural network models such as easy to use and deploy, end-to-end training as a single learning problem without hand-crafted features. Last we investigate objective methods for measuring perceptual image quality and propose a new deep feature based image quality assessment (DFB-IQA) index by measuring the inconsistency between the distorted image and the reference image in feature space. The proposed DFB-IQA index performs very well and behave consistently with subjective mean opinion scores when applied to distorted images created from a variety of different types of distortions. Our works contribute to a growing literature that demonstrates the power of deep learning in solving traditional signal processing problems and advance the state of the art on different tasks

    Improving variational autoencoder with deep feature consistent and generative adversarial training

    Get PDF
    We present a new method for improving the performances of variational autoencoder (VAE). In addition to enforcing the deep feature consistent principle thus ensuring the VAE output and its corresponding input images to have similar deep features, we also implement a generative adversarial training mechanism to force the VAE to output realistic and natural images. We present experimental results to show that the VAE trained with our new method outperforms state of the art in generating face images with much clearer and more natural noses, eyes, teeth, hair textures as well as reasonable backgrounds. We also show that our method can learn powerful embeddings of input face images, which can be used to achieve facial attribute manipulation. Moreover we propose a multi-view feature extraction strategy to extract effective image representations, which can be used to achieve state of the art performance in facial attribute prediction

    End-to-End Single Image Fog Removal using Enhanced Cycle Consistent Adversarial Networks

    Full text link
    Single image defogging is a classical and challenging problem in computer vision. Existing methods towards this problem mainly include handcrafted priors based methods that rely on the use of the atmospheric degradation model and learning based approaches that require paired fog-fogfree training example images. In practice, however, prior-based methods are prone to failure due to their own limitations and paired training data are extremely difficult to acquire. Inspired by the principle of CycleGAN network, we have developed an end-to-end learning system that uses unpaired fog and fogfree training images, adversarial discriminators and cycle consistency losses to automatically construct a fog removal system. Similar to CycleGAN, our system has two transformation paths; one maps fog images to a fogfree image domain and the other maps fogfree images to a fog image domain. Instead of one stage mapping, our system uses a two stage mapping strategy in each transformation path to enhance the effectiveness of fog removal. Furthermore, we make explicit use of prior knowledge in the networks by embedding the atmospheric degradation principle and a sky prior for mapping fogfree images to the fog images domain. In addition, we also contribute the first real world nature fog-fogfree image dataset for defogging research. Our multiple real fog images dataset (MRFID) contains images of 200 natural outdoor scenes. For each scene, there are one clear image and corresponding four foggy images of different fog densities manually selected from a sequence of images taken by a fixed camera over the course of one year. Qualitative and quantitative comparison against several state-of-the-art methods on both synthetic and real world images demonstrate that our approach is effective and performs favorably for recovering a clear image from a foggy image.Comment: Submitted to IEEE Transactions on Image Processin

    Deep Reinforcement Learning based Patch Selection for Illuminant Estimation

    Get PDF
    Previous deep learning based approaches to illuminant estimation either resized the raw image to lower resolution or randomly cropped image patches for the deep learning model. However, such practices would inevitably lead to information loss or the selection of noisy patches that would affect estimation accuracy. In this paper, we regard patch selection in neural network based illuminant estimation as a controlling problem of selecting image patches that could help remove noisy patches and improve estimation accuracy. To achieve this, we construct a selection network (SeNet) to learn a patch selection policy. Based on data statistics and the learning progression state of the deep illuminant estimation network (DeNet), the SeNet decides which training patches should be input to the DeNet, which in turn gives feedback to the SeNet for it to update its selection policy. To achieve such interactive and intelligent learning, we utilize a reinforcement learning approach termed policy gradient to optimize the SeNet. We show that the proposed learning strategy can enhance the illuminant estimation accuracy, speed up the convergence and improve the stability of the training process of DeNet. We evaluate our method on two public datasets and demonstrate our method outperforms state-of-the-art approaches
    corecore